Mosaic and heatmap for three distinctive albums

                           Truth
Prediction                  Continuum Room For Squares
  Continuum                         2                1
  Room For Squares                  4               12
  The Search for Everything         6                0
                           Truth
Prediction                  The Search for Everything
  Continuum                                         2
  Room For Squares                                  2
  The Search for Everything                         8

# A tibble: 3 x 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy multiclass     0.595
2 kap      multiclass     0.388
3 j_index  macro          0.382
# A tibble: 3 x 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy multiclass     0.568
2 kap      multiclass     0.349
3 j_index  macro          0.341
$Continuum
35 x 1 sparse Matrix of class "dgCMatrix"
                          1
(Intercept)      -0.5294009
danceability     -0.6219245
energy            .        
loudness         -2.8225666
speechiness       0.5421036
acousticness      .        
instrumentalness  .        
liveness          .        
valence           .        
tempo             1.5544615
duration          0.5740248
C                 .        
`C#|Db`           .        
D                 .        
`D#|Eb`           1.5222356
E                 .        
F                 .        
`F#|Gb`          -1.7037551
G                 .        
`G#|Ab`           .        
A                 0.9029103
`A#|Bb`           .        
B                 .        
c01              -5.2891822
c02               .        
c03               .        
c04              -0.3137720
c05               .        
c06               .        
c07               .        
c08               .        
c09               .        
c10               4.6280590
c11               .        
c12               .        

$`Room For Squares`
35 x 1 sparse Matrix of class "dgCMatrix"
                           1
(Intercept)      -0.01477458
danceability      .         
energy            .         
loudness          .         
speechiness       .         
acousticness     -0.91673341
instrumentalness  .         
liveness          .         
valence           .         
tempo             .         
duration          .         
C                 .         
`C#|Db`           0.22777826
D                 .         
`D#|Eb`           .         
E                 .         
F                 .         
`F#|Gb`           .         
G                 .         
`G#|Ab`           .         
A                 .         
`A#|Bb`           .         
B                 .         
c01               .         
c02               .         
c03               .         
c04               .         
c05               .         
c06               .         
c07               .         
c08               .         
c09              -4.79879773
c10               .         
c11               .         
c12               .         

$`The Search for Everything`
35 x 1 sparse Matrix of class "dgCMatrix"
                           1
(Intercept)       0.54417548
danceability      .         
energy            .         
loudness          .         
speechiness       .         
acousticness      2.73740138
instrumentalness  0.22978022
liveness          .         
valence           0.19828772
tempo             .         
duration          .         
C                 .         
`C#|Db`           .         
D                 .         
`D#|Eb`          -4.13621947
E                 .         
F                 1.42196539
`F#|Gb`           .         
G                 .         
`G#|Ab`           .         
A                -0.51648033
`A#|Bb`           3.17780521
B                 .         
c01               .         
c02               .         
c03               .         
c04               0.33928402
c05               .         
c06               4.87299251
c07              -0.06887533
c08               .         
c09               .         
c10               .         
c11               .         
c12               .         
# A tibble: 3 x 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy multiclass     0.649
2 kap      multiclass     0.471
3 j_index  macro          0.466

Call:
C5.0.default(x = x, y = y, trials = 1, control = C50::C5.0Control(minCases =
 2, sample = 0))


C5.0 [Release 2.07 GPL Edition]     Sun Mar 22 23:58:18 2020
-------------------------------

Class specified by attribute `outcome'

Read 37 cases (35 attributes) from undefined.data

Decision tree:

c09 <= -0.1415127: Room For Squares (14/1)
c09 > -0.1415127:
:...`D#\|Eb` <= -1.062942: The Search for Everything (4)
    `D#\|Eb` > -1.062942:
    :...liveness <= -0.3603865: Continuum (10/1)
        liveness > -0.3603865:
        :...liveness <= 0.567418: The Search for Everything (6)
            liveness > 0.567418: Continuum (3)


Evaluation on training data (37 cases):

        Decision Tree   
      ----------------  
      Size      Errors  

         5    2( 5.4%)   <<


       (a)   (b)   (c)    <-classified as
      ----  ----  ----
        12                (a): class Continuum
              13          (b): class Room For Squares
         1     1    10    (c): class The Search for Everything


    Attribute usage:

    100.00% c09
     62.16% `D#\|Eb`
     51.35% liveness


Time: 0.0 secs
# A tibble: 3 x 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy multiclass     0.676
2 kap      multiclass     0.512
3 j_index  macro          0.506

# A tibble: 3 x 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy multiclass     0.595
2 kap      multiclass     0.389
3 j_index  macro          0.383


These are the mosaic and the heatmap for three distinctive John Mayer albums. Room For Squares is the first album, which also contains the biggest outlier for track popularity. The Search for Everything is the most recent album. And Continuum is somewhere in between and contains three very popular tracks. The truth and prediction of these three albums are being compared here. Since it took me a while to get everything to work properly, I did not have much time left to have a better look at what information can be found in these maps/plots or to include more information about the other plots.


Hierarchical or k-means clustering

K-means clustering with 4 clusters of sizes 2, 3, 4, 3

Cluster means:
  danceability     energy   loudness speechiness acousticness instrumentalness
1    0.4483754 -0.5055769 -0.8542989  -0.2617314    0.4311570       -0.3560762
2    1.1628505 -0.3483975 -0.5210990  -0.3347407    0.3168866       -0.3472233
3   -0.6672185  1.2108816  1.1846645   0.6608718   -1.2083289        0.3202983
4   -0.5721427 -0.9290601 -0.4889211  -0.3719341    1.0067805        0.1575429
     liveness    valence      tempo   duration          C       C#|Db
1  0.72504390 -0.5811429 -0.4569806  0.3095383 -0.6452313 -0.06133185
2 -0.49929630  0.5065896 -0.4990188  0.2360117  0.4125469  0.23265043
3 -0.01787979  0.5472212  0.4486234  0.3466649 -0.4925857  0.28181579
4  0.03977342 -0.8487893  0.2055080 -0.9045904  0.6743883 -0.56751692
            D      D#|Eb           E           F      F#|Gb          G
1 -0.33204926 -0.7399263  0.09467322  1.30255090  1.9057041  0.3395602
2 -0.01792557  1.2025659 -1.31851366  0.04740666 -0.8564878 -0.5684698
3 -0.08188922 -0.1916162  0.05872010 -0.16745673 -0.2575904 -0.5981355
4  0.34847737 -0.4537935  1.17710471 -0.69249828 -0.0705277  1.1396102
        G#|Ab           A      A#|Bb           B        c01        c02
1 -0.65154940 -0.46684298 -0.6774957 -0.11841796 -1.0827059 -0.3413168
2  1.18316804 -0.32349614  0.4445460 -0.90263985 -0.3498639 -0.3604129
3 -0.08164101  0.09139234  0.6927170  0.75363901  1.1463431  1.0878198
4 -0.63994710  0.51286834 -0.9165048 -0.02326685 -0.4567895 -0.8624690
         c03        c04        c05        c06        c07        c08        c09
1 -0.8658813 -1.0580524  0.7167423 -0.6281798  0.1946347 -0.8353770  1.1322554
2 -0.9037814 -0.4057427  0.1770408  1.0089714 -0.4854060 -0.7882692  0.2083610
3  0.3173946  0.4510403 -0.9921773 -0.6707792  0.6896874  0.7166807 -0.2957887
4  1.0578427  0.5097239  0.6680341  0.3041874 -0.5639337  0.3896130 -0.5688130
         c10        c11         c12
1 -0.3699948  0.2827328  1.43603416
2 -0.3740679 -1.2417615  0.04227002
3  1.0470441  0.3304235 -0.36506877
4 -0.7753277  0.6127083 -0.51286777

Clustering vector:
Waiting On the Wo... I Don't Trust Mys...               Belief 
                   3                    2                    3 
             Gravity    The Heart of Life             Vultures 
                   1                    4                    2 
     Stop This Train Slow Dancing in a...         Bold as Love 
                   1                    2                    3 
Dreaming with a B...            In Repair I'm Gonna Find An... 
                   4                    3                    4 

Within cluster sum of squares by cluster:
[1] 22.16219 56.02710 71.22445 40.35515
 (between_SS / total_SS =  49.3 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      
[1] 432.0 345.6

Novelty function, chromagram and cepstrogram


Verse: 87-107 sec

Pre-chorus: 108-126 sec

Chorus: 127-143 sec

Bridge: 144-182 sec

What can be noticed about this novelty function is that the chorus actually has the smallest peaks. The verse already shows some more peaks, but the bridge shows even more than the verse. However, the most and the highest peaks can be found after the bridge. This is quite remarkable, since this is nothing special in this song. It is just a short intermezzo which builds up to the next chorus. The chromagram and timbre cepstrogram do not show many important details about the structure and melody of this song. However the chromagram does clearly show that the D (and in more detail, the D-minor chord) is very present in the bridge section.


Tempogram

Unfortunately, the code for my tempogram stopped working when I got home and tried to work on the assignment. If I could get the tempogram to work, I might include it in my portfolio.


Variation in tempo for the first and most recent album


This visualisation shows the variation in tempo for the first album, Room For Squares, and the most recent album, The Search for Everything. The x-axis shows the mean tempo in beats per minute (bpm). The y-axis shows the standard deviation of the tempo. The size of the points shows the duration of the song, the color shows which song belongs to which album. And the transparency shows the loudness of the songs in dBFS. It seems that the tracks on The Search for Everything seem to be more clustered with some outliers on the y-axis, whereas the tracks on Room For Squares seem te be a bit more divided. Besides this, it can be noticed that there isn’t that big of a difference for the duration of the tracks. The volume, however, does show more of a difference. The difference in volume between the two albums does not seem that big, but the difference between the tracks on Room For Squares seems to be a bit clearer.


Your Body Is a Wonderland, Slow Dancing in a Burning Room and In the Blood self-similarity matrices


These are three self-similarity matrices. The first self-similarity matrix is of the biggest outlier, Your Body Is a Wonderland. This self-similarity matrix shows one clear diagonal line. This line indicates the music in time. The next noticeable element are the two more yellow (horizontal and vertical) lines which form some kind of window pane. This indicates that something unexpected happens at this point in time. This means that the overall sound changes a lot. The last element is actually not very noticeable. This element is the homogeneity and this should show different segments in the song that sound similar.

The second self-similarity matrix shows the second most popular track, which is “Slow Dancing in a Burning Room” (from the album “Continuum”). The two biggest differences between the self-similarity matrix of Your Body Is a Wonderland and the self-similarity matrix of Slow Dancing in a Burning Room is that the change of sound (possibly the bridge) takes place earlier in Slow Dancing in a Burning Room and that the checkerboard pattern can be seen more clearly in the self-similarity matrix of Slow Dancing in a Burning Room.

The third self-similarity matrix shows the most popular song of the most recent album, which is “In the Blood”. The diagonal line shows the music unfolding in time. The more yellow (horizontal and vertical) lines, which appear somewhere between 150 and 200 seconds, mark a change in music. In this case it shows a small guitar solo or bridge in the song. The checkerboard pattern in this self-similarity matrix looks clearer compared to the other two sel-similarity matrices. This pattern shows the homogeneity which means that it shows certain segments which sound similar. The checkerboard pattern therefore indicates that there can be found multiple similar sounding segments in this track.


Comparing of old and new John Mayer albums

For this corpus idea, I wanted to compare the older John Mayer albums to the newer John Mayer albums. The first thing I did here was compare only the first and last album to see what kind of differences I could find here, especially in the popularity and the valence of the tracks. The next step was to make a scatterplot that easily shows these differences. I finally made this into an interactive scatterplot to be able to show more of the information and to show which song is which. However, this scatterplot would only show two of the total of seven albums. So, my next step was to include all of the albums and make good visuals to show as much information as possible of every album. The first visual is a boxplot. I made boxplots of every album and put them next to each other to be able to compare all of them. What I compared here is only the track popularity. Since this would not be enough information, I also made a scatterplot. This scatterplot would show the albums in different colors, the energy, the valence, and of course the track popularity. My goal here is to try and find out if the older or newer albums of John Mayer are more popular, and if so, why these would be more popular. With all of the information and the visuals, I hope to find an answer to these questions.


So, I will use boxplots and interactive scatterplots to try to find more information about the track popularity, the valence, the energy, the mode, and the loudness of John Mayer’s albums. With this information I hope to find an answer to whether the older or newer albums are more popular and why.


Most hits are on the album Continuum


These boxplots show the popularity of every John Mayer album. These boxplots show that the albums “Continuum” and “The Search for Everything” seem to contain the most popular tracks. However, it also shows that Continuum contains slightly more popular tracks than The Search for Everything, which is why, according to these boxplots, it can be said that Continuum is the most popular John Mayer album. This album is from the year 2006, which would mean that his older music is more popular than his newer music, however I do think that the success of this album does not have much to do with the oldness or newness, but more with the guitar skills in the tracks of this album.


The biggest outlier is the track “Your Body Is a Wonderland”


This interactive scatterplot clearly shows all of the albums in different colors. Because of the different colors it is easy to see which albums seem to be more popular and which album contains most outliers for example. In this scatterplot, the valence is shown on the x-axis and the track popularity is shown on the y-axis. The size of the dots in this scatterplot shows the energy. In this interactive scatterplot it can easily be seen that the biggest outlier (mostly in popularity) is the track “Your Body Is a Wonderland” of the album Room For Squares, which also is the oldest album in the scatterplot. The other three ouliers, however, are the tracks “Slow Dancing In A Burning Room”, “Gravity”, and “Waiting On the World to Change”, which are all on the album Continuum. Even though, these seem to be the most popular tracks, it does not show much about the valence or loudness for example. It seems that there is no real connection between the track popularity and the valence and/or energy.